Reducing Rule Covers with Deterministic Error Bounds
نویسندگان
چکیده
The output of boolean association rule mining algorithms is often too large for manual examination. For dense datasets, it is often impractical to even generate all frequent itemsets. The closed itemset approach handles this information overload by pruning “uninteresting” rules following the observation that most rules can be derived from other rules. In this paper, we propose a new framework, namely, the generalized closed (or -closed) itemset framework. By allowing for a small tolerance in the accuracy of itemset supports, we show that the number of such redundant rules is far more than what was previously estimated. Our scheme can be integrated into both levelwise algorithms (Apriori) and two-pass algorithms (ARMOR). We evaluate its performance by measuring the reduction in output size as well as in response time. Our experiments show that incorporating g-closed itemsets provides significant performance improvements on a variety of databases.
منابع مشابه
A Novel Qualitative State Observer
The state estimation of a quantized system (Q.S.) is a challenging problem for designing feedback control and model-based fault diagnosis algorithms. The core of a Q.S. is a continuous variable system whose inputs and outputs are represented by their corresponding quantized values. This paper concerns with state estimation of a Q.S. by a qualitative observer. The presented observer in this pape...
متن کاملReduced-order performance of parallel and series-parallel identifiers with weakly observable parasitics
The stability properties of discrete-time parallel and series-parallel identifiers with respect to a specific model-plant order mismatch are analyzed. While in a deterministic environment with no modeling error the two schemes give identical results, when used in a deterministic environment with modeling error their performance is different. We assume a singularly perturbed state representation...
متن کاملAdaptive integration for multi-factor portfolio credit loss models
We propose algorithms of adaptive integration for calculation of the tail probability in multi-factor credit portfolio loss models. We first devise the classical Genz-Malik rule, a deterministic multiple integration rule suitable for portfolio credit models with number of factors less than 8. Later on we arrive at the adaptive Monte Carlo integration, which simply replaces the deterministic int...
متن کاملStochastic and Deterministic Approaches to Estimation in H1
This paper examines system identiication methods from frequency response data that have recently emerged under the title of`Estimation in H 1 '. We brieey review this work and examine in detail the eeects of model order on linear algorithms. This leads to a model order selection criterion which has not been previously discussed in the literature. All the existing literature in this area examine...
متن کاملRectangle Size Bounds and Threshold Covers in Communication Complexity
We investigate the power of the most important lower bound technique in randomized communication complexity, which is based on an evaluation of the maximal size of approximately monochromatic rectangles, with respect to arbitrary distributions on the inputs. While it is known that the 0-error version of this bound is polynomially tight for deterministic communication, nothing in this direction ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2003